Generalised Entropy MDPs and Minimax Regret

نویسندگان

  • Emmanouil G. Androulakis
  • Christos Dimitrakakis
چکیده

Bayesian methods suffer from the problem of how to specify prior beliefs. One interesting idea is to consider worst-case priors. This requires solving a stochastic zero-sum game. In this paper, we extend well-known results from bandit theory in order to discover minimax-Bayes policies and discuss when they are practical.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Geometric Traversal Algorithm for Reward-Uncertain MDPs

Markov decision processes (MDPs) are widely used in modeling decision making problems in stochastic environments. However, precise specification of the reward functions in MDPs is often very difficult. Recent approaches have focused on computing an optimal policy based on the minimax regret criterion for obtaining a robust policy under uncertainty in the reward function. One of the core tasks i...

متن کامل

Robust Policy Computation in Reward-Uncertain MDPs Using Nondominated Policies

The precise specification of reward functions for Markov decision processes (MDPs) is often extremely difficult, motivating research into both reward elicitation and the robust solution of MDPs with imprecisely specified reward (IRMDPs). We develop new techniques for the robust optimization of IRMDPs, using the minimax regret decision criterion, that exploit the set of nondominated policies, i....

متن کامل

Robustness in portfolio optimization based on minimax regret approach

Portfolio optimization is one of the most important issues for effective and economic investment. There is plenty of research in the literature addressing this issue. Most of these pieces of research attempt to make the Markowitz’s primary portfolio selection model more realistic or seek to solve the model for obtaining fairly optimum portfolios. An efficient frontier in the ...

متن کامل

Sampling Based Approaches for Minimizing Regret in Uncertain Markov Decision Processes (MDPs)

Markov Decision Processes (MDPs) are an effective model to represent decision processes in the presence of transitional uncertainty and reward tradeoffs. However, due to the difficulty in exactly specifying the transition and reward functions in MDPs, researchers have proposed uncertain MDP models and robustness objectives in solving those models. Most approaches for computing robust policies h...

متن کامل

Fighting Bandits with a New Kind of Smoothness

We provide a new analysis framework for the adversarial multi-armed bandit problem. Using the notion of convex smoothing, we define a novel family of algorithms with minimax optimal regret guarantees. First, we show that regularization via the Tsallis entropy, which includes EXP3 as a special case, matches the O( √ NT ) minimax regret with a smaller constant factor. Second, we show that a wide ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1412.3276  شماره 

صفحات  -

تاریخ انتشار 2014